7 research outputs found

    Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes

    Get PDF
    Background: Phosphorylation is the most frequent post-translational modification made to proteins and may regulate protein activity as either a molecular digital switch or a rheostat. Despite the cornucopia of high-throughput (HTP) phosphoproteomic data in the last decade, it remains unclear how many proteins are phosphorylated and how many phosphorylation sites (p-sites) can exist in total within a eukaryotic proteome. We present the first reliable estimates of the total number of phosphoproteins and p-sites for four eukaryotes (human, mouse, Arabidopsis, and yeast). Results: In all, 187 HTP phosphoproteomic datasets were filtered, compiled, and studied along with two low-throughput (LTP) compendia. Estimates of the number of phosphoproteins and p-sites were inferred by two methods: Capture-Recapture, and fitting the saturation curve of cumulative redundant vs. cumulative non-redundant phosphoproteins/p-sites. Estimates were also adjusted for different levels of noise within the individual datasets and other confounding factors. We estimate that in total, 13 000, 11 000, and 3000 phosphoproteins and 230 000, 156 000, and 40 000 p-sites exist in human, mouse, and yeast, respectively, whereas estimates for Arabidopsis were not as reliable. Conclusions: Most of the phosphoproteins have been discovered for human, mouse, and yeast, while the dataset for Arabidopsis is still far from complete. The datasets for p-sites are not as close to saturation as those for phosphoproteins. Integration of the LTP data suggests that current HTP phosphoproteomics appears to be capable of capturing 70% to 95% of total phosphoproteins, but only 40% to 60% of total p-sites

    The Complex Evolutionary History of Aminoacyl-tRNA Synthetases

    Get PDF
    Aminoacyl-tRNA synthetases (AARSs) are a superfamily of enzymes responsible for the faithful translation of the genetic code and have lately become a prominent target for synthetic biologists. Our large-scale analysis of \u3e2500 prokaryotic genomes reveals the complex evolutionary history of these enzymes and their paralogs, in which horizontal gene transfer played an important role. These results show that a widespread belief in the evolutionary stability of this superfamily is misconceived. Although AlaRS, GlyRS, LeuRS, IleRS, ValRS are the most stable members of the family, GluRS, LysRS and CysRS often have paralogs, whereas AsnRS, GlnRS, PylRS and SepRS are often absent from many genomes. In the course of this analysis, highly conserved protein motifs and domains within each of the AARS loci were identified and used to build a web-based computational tool for the genome-wide detection of AARS coding sequences. This is based on hidden Markov models (HMMs) and is available together with a cognate database that may be used for specific analyses. The bioinformatics tools that we have developed may also help to identify new antibiotic agents and targets using these essential enzymes. These tools also may help to identify organisms with alternative pathways that are involved in maintaining the fidelity of the genetic code

    Βιοπληροφορική ανάλυση, διαχείριση και οργάνωση βιολογικών δεδομένων σχετιζόμενων με τη μετα-μεταφραστική ρύθμιση

    No full text
    Post-translational regulation is an important, fast and energy efficient level of gene regulation that has attracted the focus of many high-throughput technologies in the last 20 years. Post-translational modifications of amino acids and especially proteinphosphorylation play a pivotal role at this level of cellular regulation. Accordingly, this thesis focused on publicly available and abundant high-throughput proteinphosphorylation and methylation data, in order to develop computational tools and bioinformatics methods and pipelines, with the aim to analyze them and transform raw data into biological knowledge, about the properties of the eukaryotic phosphoproteome. During this thesis, phosphoproteomic and methylproteomic data were mined from the literature. An annotation tool and a database were developed in order to facilitate the mining and storage of these complex data, that were integrated with many other omic and evolutionary data. Statistical analyses of the gathered and filtered data allowed for a reliable estimate of the total number of phosphoproteins and phosphorylation sites in model eukaryotes. Furthermore, a focused and in-depth study of the yeast phosphoproteome revealed its pivotal role in the central metabolism and further identified key metabolic processes of biotechnological importance that may bemanipulated in the future, with precision, by mutating key phosphorylation sites. Finally, neural networks were developed to predict phosphorylation and methylation sites and further predict potential meth-phos switches and/or clusters. The tools and analyses that were developed during this thesis may function as the first step towards more advanced tools and methods that will integrate many other post-translational modifications in the future.Η μετα-μεταφραστική ρύθμιση αποτελεί ένα σημαντικό, γρήγορο και ενεργειακά αποδοτικό επίπεδο της κυτταρικής ρύθμισης, για το οποίο έχουν αναπτυχθεί πολλές ομικές τεχνολογίες μεγάλης κλίμακας. Η μετα-μεταφραστική τροποποίηση των αμινοξέων και ειδικότερα η πρωτεϊνική φωσφορυλίωση και μεθυλίωση έχουν κεντρικό ρόλο. Αυτή η διδακτορική διατριβή εστίασε σε δημοσιευμένα δεδομένα φωσφο-πρωτεωμικής και μεθυλ-πρωτεωμικής με σκοπό να αναπτύξει υπολογιστικά εργαλεία και βιοπληροφορικές μεθόδους/αναλύσεις που θα μπορούν να τα αναλύσουν και να εξάγουν γνώση για τις ιδιότητες αυτού του επιπέδου ρύθμισης. Κατά την διάρκεια αυτής της διατριβής, συλλέχθησαν, φιλτραρίστηκαν, αποθηκεύτηκαν και οργανώθηκαν δημοσιευμένα δεδομένα, με τη βοήθεια ενός υπολογιστικού εργαλείου διαχείρισης της Βιβλιογραφίας και μιας βάσης δεδομένων που ανέπτυξα. Επιπλέον, και άλλα ομικά και εξελικτικά δεδομένα ενσωματώθηκαν με σκοπό να πραγματοποιηθούν βιοπληροφορικές αναλύσεις σε βάθος. Στατιστικές αναλύσεις επέτρεψαν να εκτιμηθεί το σύνολο των πρωτεϊνών και των αμινοξέων ενός ευκαρυωτικού οργανισμού που υφίστανται φωσφορυλίωση. Mια εις βάθος βιοπληροφορική ανάλυση επέτρεψε να αποκαλυφθεί η σημασία της πρωτεϊνικής φωσφορυλίωσης στην ρύθμιση του κεντρικού μεταβολισμού του ζυμομύκητα S. cerevisiae όπως επίσης και οι θέσεις φωσφορυλίωσης με πιθανές βιοτεχνολογικές εφαρμογές, σε περίπτωση στοχευμένης μετάλλαξής τους. Επιπλέον, αναπτύχθηκαν νευρωνικά δίκτυα για την πρόβλεψη θέσεων φωσφορυλίωσης, μεθυλίωσης, καθώς επίσης και συνδυαστικών μοριακών διακοπτών. Τα εργαλεία και οι μέθοδοι/αναλύσεις που αναπτύχθηκαν/εφαρμόστηκαν κατά την πραγματοποίηση αυτής της διατριβής δύνανται να εξελιχθούν ώστε να επιτρέψουν την ενσωμάτωση επιπλέον μετα-μεταφραστικών τροποποιήσεων στο μέλλον

    The challenges of interpreting phosphoproteomics data : a critical view through the bioinformatics lens

    No full text
    During the last decade, there has been great progress in high-throughput (HTP) phosphoproteomics and hundreds or even thousands of phosphorylation sites (p-sites) can now be detected in a single experiment. This success is attributable to a combination of very sensitive Mass Spectrometry instruments, better phosphopeptide enrichment techniques and bioinformatics software that are capable of detecting peptides and localizing p-sites. These new technologies have opened up a whole new level of gene regulation to be studied, with great potential for therapeutics and synthetic biology. Nevertheless, many challenges remain to be resolved; these concern the biases and noise of these proteomic technologies, the biological noise that is present, as well as the incompleteness of the current datasets. Despite these problems, the datasets published so far appear to represent a good sample of a complete phosphoproteome of some organisms and are capable of revealing their major properties
    corecore